
SYSTEM AND METHOD FOR TEACHING 
WRITING USING MICROANALYSIS OF TEXT 

BACKGROUND OF THE INVENTION 

1. Field of Invention 

5 This invention is related to machine mediated instruction and computer based 

learning systems. 

2. Description of Related Art 

Current writing analysis involves human writer review processes. These 
processes include an instructor or a peer reviewing a written work. These processes 

10 can also be computer enhanced by enabling the review process to take place over a 

computer network, via, for example, email. Alternatively, the review process can take 
place in computer-enhanced, collaborative-peer-review environments, in which other 
participants critique the written work. 

These writing analysis processes suffer from a number of problems. In writing 

1 5 workshops, the instructor reviews the text from the perspective of the target audience. 
The instructor provides feedback on problematic language. The instructor often 
indicates a sentence is incorrect but usually does not provide a reader centric 
microanalysis of the text explaining precisely why the text is incorrect. 

In fact, reviewers tend to focus on two levels of structure in the review. Since 

20 reviewers feel comfortable with grammar checking, the reviewers might perform 

grammar checking for the entire text. Less frequently, vague or ambiguous sentences 
may be identified. However, reviewers do not explain why a sentence is vague or 
ambiguous. The location of the problem is identified and the user is expected to 
understand how to correct the problem. 

25 A second problem stems from the stylistic preferences of each reviewer. A 

reviewer's subjective style preferences will in turn affect a reviewer's critique of the 
written work. Thus, instead of receiving an objective indication of how well the work 
communicated the desired information to the reader, the reviewer may focus on 
subjective stylistic preferences. 

30 Further, the sheer volume of material to be reviewed tends to force a reviewer 

to focus on relatively easier problems. For example, the writer's overall organization 
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may be critiqued with only the occasional exemplary sentence construction analyzed 
in detail. 

Outlining tools, such as the "outline view" in the Microsoft Word® word 
processor application, as well as Xerox Corporation's "Notecards" and EastGate 
Corporation's "Story Space" tools, help the writer frame higher level concepts and 
issues. The writer may then concentrate on developing each of the ideas within a 
framework. However, these tools do not address the structure and organization of the 
text below the concept and topic level. Since these tools do not support analyzing 
how the units of meaning interrelate, they cannot be used to show the writer how to 
improve the text below the concept and issue level. 

Grammar checking tools, such as "Grammatick II" and the grammar tool in 
Microsoft Word®, apply statistical formulas to a selected text to determine readability 
based on a variety of different formulae, such as the Flesch Reading Ease Index or the 
Flesch-Kincaid Grade Level index. These tools also provide sets of rules that can be 
applied to a written text to identify run-on sentences, sentence fragments, archaic 
expressions and gender-specific expressions. However, these tools merely indicate 
whether a text satisfies the rules. Though these tools may provide suggestions for 
improving a text, based on the identified rule violations, these tools do not provide a 
structural representation of the text. Therefore, these tools cannot show the user how 
to improve the relationships of meaning between the units of text. 

Text analysis tools, as described in Marcu, "The Rhetorical parsing of natural 
language texts" Proceedings of the 35 th Annual Meeting of the Association for 
Computational Linguistics and the 8 th Conference of the European Chapter of the 
Association for Computational Linguistics, Madrid, July, 1997, 96-103 can provide 
structural representation of a text based on an analytic framework. The system 
described, for example, is directed to the analysis of texts for the purpose of improved 
discourse level automatic Natural Language processing. It is not directed to 
improving the structure or style of the text or to educate the writer in how to improve 
the organization of future prose works that might be produced. 

Rhetorical Structures Theory is a highly complex theory. The basic units of 
analysis have never been articulated clearly and the several variations of Rhetorical 
Structures Theory differ from one another in the basic relations between units that 
they employ. The relations themselves are very complex, overlapping and 




ambiguous. For example, classical Rhetorical Structures Theory as developed by 
Mann and Thompson (1988) includes at least ten presentational relations, five 
multinuclear relations and fifteen subject matter relations. Training coders is a very 
lengthy task, requiring weeks of intensive study and supervision. Strong differences 
5 of opinion arise among experienced coders about the relationships which link units 
together. Intercoder reliability is very low. Depending on the level of 
analysis chosen by different coders, quite different structural trees labeled with 
Rhetorical Structures relations may be built. Therefore, different Rhetorical 
Structures Theory analyzers may produce significantly different structural 

10 representations from the same text. These factors taken together pose particular 
problems when attempting to apply Rhetorical Structures Theory in a learning 
environment where the goal is to help students improve their written communicative 
skills by applied text micro-analysis. 

The Summarist system, as discussed by Hovy and Lin, "Automated Text 

15 Summarization in SUMMARIST" in Proceedings of the Workshop of Intelligent 
Scaleable Text Summarization, July 1997, uses statistical techniques, along with 
symbolic world knowledge of word meaning based on dictionaries, in attempting to 
discern a writer's intent. Since the Summarist system uses statistical techniques to 
identify important keywords, the Summarist system only produces topical keyword 

20 summaries. 

Thus, these conventional natural language processing systems attempt to 
identify intended meaning in a text corpus. These systems do not exploit linguistic 
constraint information provided in the text but instead rely on statistical analysis and 
word frequency counts. A determination is made, from this statistical information, as 

25 to what information the author intended to convey. This information is then used to 
facilitate queries. 

SUMMARY OF THE INVENTION 
However, the conventional systems described above do not provide a reader- 
centric analysis of the text and fail to show the writer why the text is incorrect or to 
30 provide a basis for determining how to fix the text. Such a reader-centric 

microanalytic system would be a valuable tool. Instead, these conventional systems 
focus on the intended meaning of the writer. That is, systems and methods that allow 
a user to analyze a text corpus from the reader's perspective and to interactively view 
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how changes made to the written text change the summary to more closely match the 
desired communicative goal of the written piece are desirable. 

This invention provides writing instructions systems and methods for teaching 
writing skills to a student using an objective reader-centric microanalysis of the text. 

This invention separately provides systems and methods that teach expository 
writing skills using a "virtual reader". A "virtual reader" is an objective reader that 
follows set rules to deterministically identify the information conveyed by a passage 
of text. 

This invention separately provides systems and methods for designating 
important topical information within the text to be analyzed by highlighting or tagging 
the topical words or phrases. 

In various exemplary embodiments, the writer segments the text into discrete 
units of meaning consistent with a theory of discourse analysis. The segmented text 
may then be analyzed by a reviewer or a reviewing system. The reviewer or the 
reviewing system can provide comments to the user based on the analysis. The 
comments can identify errors and/or suggest improvements. The writer can then 
revise the segmented text based on the comments. 

In various exemplary embodiments, the student classifies the units of meaning 
by type. The classification may then be reviewed by the reviewer or reviewing 
system. Following this review by the reviewer or reviewing system, the writer then 
creates a discourse structural representation that reflects the interrelationships 
between the units of meaning, consistent with a selected theory of discourse. 
Optionally, the information designated by the user as important can be displayed in 
the discourse structure. 

An automatically generated summary is optionally created from the discourse 
structural representation. The summary can then be compared to the topic 
information designated by the writer as important. The degree of coverage is 
conveyed as objective feedback to the writer as to how well the text would have 
conveyed the desired information to the reader. In various exemplary embodiments, 
the reader centric text analysis systems and methods according to this invention use 
the Linguistic Discourse Model, although other theories of discourse may be used in 
place of the Linguistic Discourse Model. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
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The invention will be described with reference to the accompanying drawings, 
in which like elements are labeled with like numbers, and which: 

Fig. 1 shows a block diagram of an exemplary embodiment of the invention; 

Fig. 2 shows an exemplary flowchart of the invention; 

Fig. 3 shows an exemplary flowchart of the summary algorithm; and 

Fig. 4 shows an exemplary discourse structural representation of text. 
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS 

Fig. 1 shows an exemplary embodiment of an expository writing teaching 
system 200. 

The expository writing teaching system 200 includes a controller 220 
connected to an input/output interface 210 and a memory 230. The memory can 
include one or none of having a text portion 231 a segment portion 232, text building 
portion 233, summary portion 234 and tree portion 235. The input/output interface 
210 is connected by a link 215 to one or more input devices 214 and to a display 
device 212. The input/output interface 210, the controller 220 and the memory 230 
are connected to an optional concept highlighting circuit or routine 240, a segmenting 
circuit or routine 250, an analyzing circuit or routine 260, a tree building circuit or 
routine 270, an optional summary generating circuit or routine 280, an optional 
concept comparator circuit or routine 290 and a reviewing circuit or routine 295 by a 
data/control bus 205. 

A link 110 connects the expository writing teaching system 200 to other 
devices or components. It will be apparent that the link 110 may also be used to 
permit remote access to the expository writing teaching system 200 for a user or a 
reviewer or reviewing system. The link 110 may be any known or later developed 
device or system for connecting to the expository writing teaching system 200, 
including a direct cable connection, a connection over a wide area network or a local 
area network, a connection over an intranet, a connection over the Internet, or a 
connection over any other distributed processing network or system. In general, the 
link 110 can be any known or later developed connection system or structure usable to 
connect a user, a reviewer or a reviewing system to the system for teaching expository 
writing 200. 




6 

The controller 220 stores information in the memory 230. The memory 230 
may be implemented using a hard disk and disk drive storage, a floppy disk storage, a 
re-writable optical disk storage , bubble memory, flash memory, static and/or dynamic 
RAM, battery-backed RAM, or the like. The controller 220 displays information on 
5 the display 212 and accepts user input via the one or more user input devices 214. 

User inputs to the expository writing teaching system 200 may include one or more of 
voice input, keyboard input, touch sensitive screen input or any method of entering 
information into the expository writing teaching system 200. 

The user calls up a text to be analyzed from the text memory 23 1 . 
10 Alternatively, the user may use any word processor, text processing application or text 
editor to create a text file to be analyzed. In various exemplary embodiments, once 
Q the text has been obtained, whether by importing the text from an existing file in the 

i;n text memory 23 1 or created using a text editor, the optional concept highlighting 

circuit or routine 240 is used to highlight concepts in the text that the user believes to 
^ 15 be important. However, it should be appreciated that, in place of, or in addition to, 

Ln highlighting, other methods for designating important concepts, such as XML tagging, 

£□ typing the concepts into a pop-up dialog box, or any other known or later developed 

J = ; j technique for identifying the important concepts, may be used. It will be appreciated 

O that the expository writing teaching system 200 may also be used without any 

□ 20 highlighting. 

The user then segments the text into text building units by activating the 
segmenting circuit or routine 250. Different linguistic theories build discourse from 
different basic text building units. In various exemplary embodiments, the Linguistic 
Discourse Model is used to analyze the text and categorize the text building units. The 
25 Linguistic Discourse Model used in these exemplary embodiments provides for three 
distinct types of text building units. Each text building unit represents the smallest 
unit of meaning in the text. A sentence may have, or be formed from, a single text 
building unit. However, a sentence is more likely to be formed from several text 
building units. Using editing features of the segmenting circuit or routine 250, the 
30 user breaks up the text to be analyzed into the text building units. In various 

exemplary embodiments, a line break is inserted between each text building unit. 
This creates a list of text building units, with one text building unit on each line. 
However, it should be appreciated that delimiting one text building unit from another 




can use any known or later developed delimiting technique. The output of the 
segmentation circuit 250 or the user segmented text will result in the same list of text 
building units. 

The list of text building units to be analyzed is then, under control of the 
5 controller 220, provided to the analyzing circuit or routine 260. In the analyzing 

circuit or routine 260, additional data fields are exposed to the user to allow the user 
to assign a text building unit type to each text building unit. The user may select a 
text building unit type for each text building unit based on the theory of discourse 
chosen to be used for instruction. The determined text building units are then stored in 

10 the text building units memory portion 233. 

There are many theories of discourse analysis that may be employed in the 
writing instruction systems and methods according to this invention. Such theories 
include, but are not limited to the Discourse Structures Theory, the Rhetorical 
Structure Theory, the Systemic Functional Grammar and Tagmemics. 

1 5 Various exemplary embodiments of the writing instruction systems and 

methods according to this invention use the Linguistic Discourse Model theory of 
discourse analysis. As indicated above, the Linguistic Discourse Model provides for 
three types of text building units: content text building units; modifier text building 
units; and operator text building units. Content text building units express some state, 

20 action or belief that is an accomplishment or property of some entity. Content text 

building units have a verb at their core, whether expressed or unexpressed. Modifier 
text building units provide context setting information for information encoded further 
along in the text. Operator text building units provide commentary on aspects of text 
organization or logical structure or may express emotional reaction to something in 

25 the context. Each text building unit is assigned to one of these categories. 

The user may then submit the users work, i.e., the list of text building units, 
along with the associated text building unit type assignments, to the reviewer or the 
text reviewing system. The reviewer may use the controller 220, the display 212 and 
the one or more input devices 214. In this case, the expository writing teaching 

30 system 200 becomes the reviewing system while used by the reviewer to review the 
user's work. Alternatively, the reviewer or reviewing system may use a different 
controller connected via a network (not shown) through the input output interface 210 



and link 1 1 0 to access the users work and to review the text building unit type 
assignments using the reviewing circuit or routine 295. 

When the reviewer, using the reviewing system, views the user's work, an 
additional field for the reviewer's comments is presented by the reviewing circuit or 
5 routine so that the reviewer may respond to the user's comments. The reviewer then 
analyzes the text building unit types assigned by the user, and provides corrections 
and/or comments as necessary. 

This review may also be performed automatically by a reviewing system that 
automatically recognizes and assigns a text building unit type to each text building 
10 unit. In this case, the comments can be generated based on the user's comments and 
the differences between the user's work and a "correct" answer. For example, if the 

□ reviewing system detects that the user has entered a comment and further detects that 
i-jwi the corresponding text building unit has been assigned an inappropriate text building 
^ unit type, a system comment may be generated and placed in the reviewer comment 
;*U 1 5 field to more fully explain why the user's selection was an inappropriate selection. 

!.fj After the reviewer or reviewing system has saved the work to the segment 

portion 232, the user may then re-open the work. The work is then processed, under 
control of the controller 220, by the tree building circuit or routine 270. The tree 

□ building circuit or routine 270 creates a node for the first text building unit of the text 
i=5 20 and begins by defining this node as the root node of the discourse tree. As additional 

fields are exposed to the user, the relation fields identify the type of constructed node 
used to link the text building units into the discourse tree. The node fields allows the 
user to select an insertion point in the discourse tree for the text building unit that is 
being analyzed. The text building units are added to the structural representation of 

25 the discourse consistent with the particular theory of discourse used in the expository 
writing teaching system 200. That is, depending on the particular theory of discourse, 
a particular text building unit can be added to the structural representation of 
discourse at different points and/or in different ways. In the Linguistic Discourse 
Model used in various exemplary embodiments of the expository writing teaching 

30 system 200, the structural representation is an open right tree. 

In various exemplary embodiments, the Linguistic Discourse Model provides 
three relationships between the text building units: 1) coordinations; 2) 
subordinations; and 3) binaries. A coordination relationship between two text 




building units exists when a second text building unit continues a discourse activity 
that was previously begun or continued by a first text building unit. In this case, the 
second text building unit is linked to the discourse tree and the first text building unit 
with a coordination node. 

In particular, in various exemplary embodiments, if the relation between the 
second text building unit to be added and text building unit, represented by a first 
node in the structural representation or tree, is a coordination relationship, the first or 
existing node is replaced by a new coordination node and the first node becomes a leaf 
node of the new coordination node. The new coordination node links the first node 
representing the first text building unit as a left child node of the new coordination 
node into the discourse tree. A second node representing the second text building unit 
to be added is then linked to the new coordination node as the right child node of the 
new coordination node. In some cases, a subsequent text building unit may extend the 
activity carried out by text building units already related to one another in the 
discourse tree as siblings under a coordination node. In this case, a third node 
representing a third text building unit will be added under the existing coordination 
node as a new right child node The second node then becomes a middle child node. 

If the relation between the text building units is not a coordination 
relationship, the second text building unit is analyzed to determine if that text building 
unit elaborates upon or interrupts an active discourse in the discourse tree. A 
discourse is active if it has a relationship node or a text building unit that is located 
along the right edge of the discourse tree. If the second text building unit elaborates 
upon, or interrupts an active discourse, the relationship between the second text 
building unit and the previous text building unit is defined as a subordination 
relationship. In this case the first node is replaced by a new subordination node and 
the first node becomes a leaf node of the subordination node. The second node 
representing the second text building unit is linked into the discourse tree as the right 
child node of the new subordination node. 

If none of the relationships between the second text building unit and the first 
text building unit forming the right edge of the discourse tree are either a coordination 
or a subordination, then the second text building unit is classified as having a binary 
relationship with the first text building unit in the discourse tree. The first or existing 
node is replaced by a new binary node and the first node becomes a leaf node of the 




binary node. The second node representing the second text building unit is then added 
as the right child node of the new binary node. 

In the embodiment using the Linguistic Discourse Model, the user progresses 
through the text, and the structural representation of discourse is an open right tree. 
5 Accordingly, the nodes representing the text building units are added to the discourse 
tree only at the right edges of the discourse tree. As a result, the potential insertion 
points where the next text building unit can be added to the discourse tree can be 
easily determined and presented to the user. In various exemplary embodiments, the 
list of insertion points can be displayed to the user in the form of a constrained drop 
10 down selection box. Of course, alternate theories of discourse may utilize different 
structural representations of discourse. 
Q Building the organizational structure using the rules of a discourse theory, 

i;p such as the Linguistic Discourse Model, allows the user to gain a better understanding 

of the relationships between the text building units in the user's text. During building 
15 of the organizational tree, the user is prompted to understand how the text building 
i s n units must interrelate in order to convey the desired information to a reader. This tree 

q building process increases the user's understanding of the communicative aspects of 

'"[I the user's own writing. If an ambiguity exists regarding the appropriate insertion 

Q point for a text building unit, this ambiguity highlights the ambiguity the reader will 

q 20 experience with respect to the same unit of meaning represented by that next text 

building unit. Alternatively, tree building may be accomplished automatically. A 
discourse tree is built reflecting the organization of the text according to the discourse 
theory being used. 

The user may then activate the summary generating circuit or routine 280. 

25 Alternatively, the summary generating circuit or routine 280 may be automatically 
invoked after the tree building circuit or routine 270 has completed processing. The 
user then selects the level of summary L to be generated. The value for the summary 
level L ranges from a lowest level, for example 0, representing the most specific level 
to a highest level P, the most general level. The value of the most general level P 

30 reflects the maximal number of subordination levels in the discourse tree. The 
summary generating circuit or routine 280 takes the structural representation of 
discourse, which in the various exemplary embodiments employing the Linguistic 
Discourse Model, is an open right tree generated by the tree building circuit or routine 
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270, as input, along with the summary level L indicated by the user. The summary 
generating circuit or routine 280 generates a summary of the structural representation, 
for the specified summary level L. 

After the summary is generated by the summary generating circuit or routine 
280, the optional concept comparator circuit or routine 290 may be activated. The 
concept comparator circuit or routine 290 may be automatically activated when the 
summary generating circuit or routine 280 completes processing or may be invoked 
by the user in a separate action. The concept comparator circuit or routine 290 
operates on the output of the concept highlighting circuit or routine 240 and the 
summary generating circuit or routine 280. The concept comparator circuit or routine 
290 compares the coverage of the information identified by the concept highlighting 
circuit or routine 240 to the summary generated by the summary circuit or routine 
280. The concept comparator circuit or routine 290 compares the words and/or 
phrases output by the concept highlighting circuit or routine 240 to those words that 
also appear in the output of the summary circuit or routine 280. 

The user may then receive a numerical ratio or a percentage, or any other 
suitable method or visualization, that indicates the amount of coverage the text 
provides to the identified word and/or phrases. The ratio indicates how many of the 
important words or phrases were conveyed to the virtual reader as indicated by the 
summary generated by the summary generating circuit or routine 280. The coverage 
measure may also be conveyed as feedback to the user by displaying the important 
concepts identified by the highlighting circuit or routine 240. The words and/or 
phrases output by the highlighting circuit or routine 240 but not found in the summary 
generated by the summary generating circuit or routine 280 are then displayed in a 
distinguishing color, such as red. In this way, the user's attention is immediately 
drawn to the problem areas in the text. Alternatively, the user may optionally view 
the highlighted portions of the text in a visualization of the structural representation of 
the discourse. 

Fig. 2 is a flowchart outlining one exemplary embodiment of a method for 
analyzing a written text according to this invention. Beginning in step SI 00, control 
continues to step SI 10, wmere the user may import a text into a text editing 
workspace. It will be app arent that the text may be entered directly into the system or 
the text may be entered in[to the system using any other method including text 
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recognition of a printed^handwritten text, a voice entry system, a word processor 
such as EMACS^the UNIX editor vi, a wordprocessor such as Microsoft 
CorporationXWord ®or any other known or later developed method or technique. 
The usep'may further edit a previously created text or may use the text editing 
5 woj^space to generate new text for analysis. 

Next, in optional step SI 20, portions of the text that convey important 
concepts may be highlighted or tagged. This may be accomplished by selecting 
relevant portions of the text and designating the selected portions of the text as 
important by using a drop down menu or a property dialog box, or by including tags 
1 0 such as XML (Extensible Markup Language) tags to delimit the important conceptual 
information. The important concepts may also be designated in a separate step 
through direct input of the relevant phrases found in the text into pop-up dialog box 
fields. It should also be appreciated that step SI 20 may be skipped. In this case, 
control jumps directly from step SI 10 to step SI 30. 
15 Once the text has been entered, control continues to step SI 30 where the user 

is prompted to segment the written text into discrete text building units. As discussed 
above, text building units are the smallest unit of meaning in the text. A sentence may 
include a single text building unit or may include a number of text building units. 

In various exemplary embodiments of the methods according to this invention, 
20 in step SI 30, the user divides the text into the discrete text building units by inserting 
a line break so that each text building unit appears on a separate line. It will be 
apparent that alternate methods of segmenting the text into text building units is 
possible. For example, in various other exemplary embodiments, the text is divided 
into text building units by bounding each text building unit within the text by a 
25 selected XML tag or the like. 

Next, in SI 40, the user designates the portion of the text that is to be analyzed. 
For example, a user may only wish to analyze a problematic paragraph rather than an 
entire text. This would occur, for example when the user had already analyzed the 
text one or more times and merely wanted to review the effects that changes made to 
30 the revised paragraph have on the resulting discourse tree. Usually, however, the 
entire text will be selected for analysis. Control then continues to step SI 50. 

It will be apparent that, in various exemplary embodiments, one or more of the 
steps may be accomplished under programmatic control. For example, for a given 




user, an individualized lesson plan can be presented and constantly monitored and 
updated, and a specific text or portion of the text can be selected for analysis to 
address specific pedagogical requirements of the student. 

In step S 1 50, a first text building unit is selected. Then, in step S 1 60, a text 
5 building unit type is assigned to the selected text building unit using the implemented 
discourse theory. As discussed above, in various exemplary embodiments, the 
Linguistic Discourse Model is the implemented theory of discourse. Comments may 
also be added, in step S 1 60, to the analysis of each text building unit as it is assigned. 
These comments may serve as temporary notes to the user, may indicate partial 

10 completion, and/or serve to communicate problem areas or pose questions to the 

reviewer. Next, in step S 1 70, a determination is made whether the last text building 
unit has been assigned a text building unit type. If the current text building unit is not 
the last one control continues to step SI 80. Otherwise control jumps to step SI 90. 

In step S 1 80 the next unassigned text building unit is selected. Control then 

15 returns to step SI 60 and the loop continues until the last text building unit needing to 
be assigned has been processed. Control then jumps to step SI 90. 

In step SI 90, once the user has assigned a text building unit type to each of the 
text building units according to the relevant theory of discourse, the work is saved to 
memory. Next, in step S200, the text submitted by the user and opened in the analysis 

20 workspace includes the fields seen by the user including text building unit, type and 
user comments fields. However, the analysis workspace now also includes one or 
more fields for the reviewer's comments. Any queries may be answered by placing an 
appropriate response in a corresponding one or more reviewer comment fields. 

Then, in step S210, a determination is made whether any text building units 

25 have been assigned. If any text building units have been inappropriately assigned, 
control continues to step S220. In step S220, the text building units are selected. 
Control then jumps back to step SI 60. Otherwise control jumps to step S230. 

In step S230, a discourse tree is begun by inserting a node representing a first 
text building unit as the root node of the discourse tree. As discussed above, the 

30 Linguistic Discourse Model implemented in various exemplary embodiments 

provides three relationships between the text building units: 1) coordinations; 2) 
subordinations; and 3) binaries. 




In step S240, relationships are defined between the text building units already 
in the tree and succeeding text building units. The relationships define a new node 
that is inserted into the tree. The related text building units form child nodes of the 
tree. 

5 Next, in step S250, a determination is made whether all text building units 

have been added to the discourse tree. If text building units remain to be added 
control continues to step S260. In step S260, the next building unit to be added is 
selected. Control then jumps back to step S240. Otherwise, once all text building 
units have been assigned, control jumps to optional step S270. 

10 In step S270, a summary is created based on summary generation algorithms 

for the writing culture under analysis such as shown in Fig. 3 and discussed below. 

On return, control then continues with optional step S280 where the 
automatically generated summary can be compared with the concepts previously 
designated by the user as important. By comparing coverage of user designated 

1 5 important concepts to the automatically generated summary, the user is provided 
objective feedback as to how well the text has conveyed the desired information. 

Control is then transferred to optional step S290 where the measure of 
coverage is displayed. It will be apparent that different automatic measures of 
coverage can be used. For example, displaying the number of words from the 

20 important concepts that also appear in the automatically generated summary as a 
percentage number gives the user a quick indication of coverage. Similarly, for 
example, displaying in a different color, the words found in the important concepts 
but not found in the automatically generated summary gives users a quick indication 
of coverage corresponding to the amount of the colored text displayed. It will be 

25 apparent that any indication of coverage can be used in the invention. 

The objective, reader centric microanalysis of the information conveyed in the 
written work can be easily compared to the information hoped to be conveyed to the 
target reader. Control is then transferred to step S299 and the process ends. 

Fig. 3 shows an exemplary flowchart of summary generation. Various 

30 exemplary embodiments of the expository writing teaching system and method 

according to this inventions are designed for an English-language expository writing 
culture. It will be apparent that different languages and genres will be summarized 
using different summary algorithms. These different summary algorithms can be 




determined by analyzing the structure of the discourse tree created by the writing 
culture and identifying the text building units of importance. 

Thus, many implementations of the summary generating technique according 
to this invention for different writing cultures will be apparent. For example, the 
5 summary generator specific to the English Expository writing culture is built upon the 
English expository convention that calls for the most salient information to be 
encoded in text units that are structurally dominant in a text. 

In the English Expository writing culture, information that gives more detail 
about a point is conventionally encoded in a text unit that is embedded or 

10 subordinated to the more general, more salient information that it elaborates upon. In 
other writing cultures, different strategies will be used. The most salient information 
in a document or a passage may be presented at the end of the text, summing up or 
concluding as it were, what was said before. 

In a text microanalysis using the Linguistic Discourse Model, the 

1 5 characteristic deployment of more or less salient information can be determined and a 
summarization generator devised that extracts information marked as more salient by 
its structural placement and encoding form before extracting information marked as 
less salient. Thus, it will be apparent how summary generation may be adapted to 
create a summary for other languages or writing cultures. 

20 The English fiction writing culture dictates the use of a different structure from 

the English expository writing culture. For instance, in English fiction, the 
ambiguities identified by the microanalysis process as potentially confusing in an 
English expository writing culture may be required in writing a murder mystery. In a 
murder mystery the ambiguity helps preserves the suspense built up in the text 

25 narrative. 

The summary generation begins with step S300 of Fig. 3 which reflects an 
exemplary flowchart of English expository summary generation. It will be apparent 
that with the appropriate substitution of a summarization system tailored to the 
specific writing culture, any type of writing in any language may be taught with the 
30 methods and systems of the invention. Control is then transferred to step S3 10. 

In step S3 10, the user enters L, the level of summary desired. The value of the 
summary level L ranges from a lowest level, for example 0, representing the most 
specific level to a highest level P, the most general level. The value of the most 
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general level P reflects the maximal number of subordination levels in the discourse 
tree. The summary level may be specified for example, by a property selection, a 
pop-up dialog box, an option setting or any other method. Next, in step S3 20 where 
the first node in the tree is assigned the lowest rank, for example 0. Then, in step 
S3 30, the child nodes are recursively selected. Control continues to step S340. In 
step S340 , a determination is made if the selected child node is a subordination. If 
the node is a subordination, control is transferred to step S360. Otherwise if the node 
is not a subordination, control is transferred to step S350. In contrast, step S360, the 
rank of the parent node is assigned to the subordinating node. Then in step S370, the 
rank of the parent node, incremented by 1, is assigned to the subordinated node. 
Control then continues to step S3 80. 

In this case, the node must be either a coordination or a binary. In either case, 
in step S3 50, the selected child node is assigned the rank of the parent node. Control 
then jumps to step S380. 

In step S3 80, a determination is made as to whether any child nodes remain to 
be processed. If child nodes remain to be processed, control is transferred to step 
S3 40 and the type determination and ranking steps are repeated for all remaining child 
nodes. When no child nodes remain to be processed, control is transferred to step 
S390. 

In step S390, all nodes of rank less than or equal to the requested summary 
level L are selected and output. Next, in optional step S4000, the punctuation may be 
adjusted to create complete sentences. Then in step S410, control returns to step 
S280. 

The exemplary embodiment describes the summarization in terms of the 
structural representation of a tree. Alternate theories of discourse that can be used in 
the expository writing teaching systems 200 may provide different structural 
representations of the text. However, any representational structure in which the 
salient information can be identified can be used. 
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Field 


410 


420 


430 


440 


450 


460 


# 


Text Building 
Unit 


Type 


Relation 


Node 


User 
Comments 


Reviewer 
Comments 


1 


John came in 
through the back 
door 


Content 










■ 2 


He walked into the 
kitchen 


Content 


Coordination 


Coordination, 
1-6 


Is this a 
modifier? 




3 


He apologized to 
his wife for being 
late 


Content 


Coordination 


Coordination, 
1-6 






4 


He had to visit his 
mother 


Content 


Subordination 


Subordination, 
3-4 


Is this a 
binary? 




5 


His mother likes 
him to come by 


Content 


Subordination 


Subordination, 
4-5 


Is this a 
coordination? 


No, it is a 
subordination. 


6 


His wife said she 
didn't mind 


Content 


Coordination 


Coordination, 
1-6 







Table 1 



Table 1 shows an example of a text segmented into text building units 1-6. 
The text building unit field contains the text building units as segmented by a user. In 
the exemplary embodiment, after text segmentation and prior to the text building unit 
type assignment, only the text building unit, type and user comments fields will be 
visible, since these are the fields used to make the assignments and interact with the 
reviewer or reviewing system. 

When the type assignment is complete and the reviewer opens the file, the 
reviewer comments field is also exposed for view. When the review is complete and 
the corrected work submitted, the user may then view and interact with the relation 
and node fields which are used to link the text building units into the discourse tree. 

The type field illustrates the assignment of text building unit type for the entry 
"He walked into the kitchen." This is a content text building unit. In various 
exemplary embodiments, the type assignment is entered by positioning the cursor in 
the cell and clicking the mouse. A constrained drop down selection list appears from 
which the content, modifier or operator type assignment may be selected. Once all the 
text building unit type assignments have been made, the file is submitted for review as 
described above. 

After correction and further review as necessary, the text building units can be 
inserted into the structural representation of the discourse. For each successive text 
building unit, the user may specify a relationship in the relation field between the next 
text building unit and text building units that serve as the insertion points in the 
discourse tree. Using the constructed relation nodes discussed above, and the node 




field, the next text building unit may be easily inserted into the discourse tree by 
providing the user with a drop down list of potential insertion points for the next text 
building unit. In some cases a subsequent text building unit may extend the activity 
carried out by text building units already related to one another on the discourse tree 
5 as siblings. In this case, the incoming text building unit will be added under the 
existing node as a next sibling. 

Fig. 4 shows an exemplary discourse tree for the exemplary list of text 
building units shown in Table 1 . The tree also includes the ranks of each node 
generated by summarizing according to the English expository writing culture. The 
10 first text building unit is linked to the second text building unit by a coordination node 
labeled 1-6. As a result each of the text building units is assigned 0, the rank of the 
j =3 parent node. The subordinating node, "He apologized to his wife" inherits the rank of 

iln the parent node, which is the subordination node 3-4. However, the rank of the 

JSj subordinated node, "He had to visit his mother" is assigned the rank of the parent node 

15 incremented by one. Similarly the rank of the subordinated node, "His mother likes 
( *n him to come by" is incremented by two. 

□ Thus, for a summary level of 0, summarizing the discourse tree produces, 

\[\ "John came in through the back door. He walked into the kitchen. He apologized to 

y his wife for being late. His wife said she didn't mind." The summary would not 

O 20 display the subordinations since these text building units are ranked higher than the 

chosen summary level. In this way, the generated summary provides an indication as 
to what information the virtual reader would understand the text to convey. This 
feedback information may then be incorporated by the user into future revisions of the 
text. 

25 While this invention has been described in conjunction with the specific 

embodiments outlined above, it is evident that many alternatives exist. For example, 
the expository writing system 200 may be used with classical Rhetorical Structures 
Theory of discourse analysis. The system allows for the segmenting of a text into the 
text building units of the Rhetorical Structures Theory or any other theory of 

30 discourse. In the Rhetorical Structures Theory the discourse is built up from nucleus 
and satellite units that reflect situations presented. 

Relations defined between text building units in Rhetorical Structures Theory 
might rely on such classical Rhetorical Structures Theory relations as: circumstance; 




condition; elaboration; evaluation; interpretation; means; non-volitional cause; non- 
volitional result; otherwise; purpose; solutionhood; unconditional; unless; volitional 
cause and volitional result as subject matter relations. Presentational relations in 
classical Rhetorical Structures Theory include: antithesis; background; concession; 
5 enablement; evidence; justify; motivation; preparation; restatement; and summary. 

The analytic process of building the structural representation of discourse 
using any theory of discourse allows the user to understand their text as viewed 
through that analytic framework. Thus, the system and methods may be adapted to 
Systemic Functional Grammar, Tagmemics or any other known or later developed 
1 0 theory of discourse analysis to teach a user how to write using these theories of 
discourse analysis. 

It should be understood that the expository writing teaching system 200 can be 
implemented on a programmed general purpose computer. However, the expository 
writing teaching system 200 can also be implemented on a special purpose computer, 

15 a programmed microprocessor or microcontroller and peripheral integrated circuit 

elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired 
electronic or logic circuit such as a discrete element circuit, a programmable logic 
device such as a PLD, PLA, FPGA or PAL, or the like. In general, any device, 
capable of implementing a finite state machine that is in turn capable of implementing 

20 the flowcharts shown in Fig. 2 and 3 may be used to implement the expository writing 
teaching system 200. 

It will be apparent that the processing circuits or routines of expository writing 
teaching system 200 may be used in collaborative networked learning environments in 
which some or all of the expository writing teaching system 200 components 

25 including the circuits or routines may be distributed throughout the network 

collaborative environment or located centrally. The environment may include web 
servers, application servers, or any other type of computer system. In general, the 
system may be used alone or in combination with any number of other general 
programmable or special purpose computers or components and in any type of 

30 environment including networked collaborative learning environments, client server 
environments and web based environments. 

Modifications and variations will be apparent to those skilled in the art. 
Accordingly, the exemplary embodiments of the invention, as set forth above, are 



* 
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intended to be illustrative, not limiting. Various changes may be made without 
departing from the spirit and scope of the invention. 



